Optimized artificial neural networks

ABSTRACT

Neural network architectures are represented by symbol strings. An initial population of networks is trained and evaluated. The strings representing the fittest networks are modified according to a genetic algorithm and the process is repeated until an optimized network is produced.

BACKGROUND OF THE INVENTION

[0001] The present invention relates to artificial neural networks, and more particularly artificial neural networks having an architecture optimized by the use of a genetic algorithm.

[0002] The term artificial neural network is used herein to describe a highly connected network of artificial neurons. For simplicity, the modifier “artificial” will usually be omitted.

[0003] Artificial neurons themselves have simply described behavior. They are threshold circuits that receive input signals and which develop one or more output signals. The input-output relationship of a neuron is determined by a neuron activation or threshold function and the sum of the input signals. The activation function may be a simple step function, a sigmoid function or some other monotonically increasing function.

[0004] Neurons are combined in highly connected networks by signal transmission paths to form neural networks. The signal transmission paths have weights associated with them, so that a signal applied to a neuron has a signal strength equal to the product of the signal applied to the signal path and the weight of that signal path. Consequently the signals received by the neurons are weighted sums determined by the weight values of the signal transmission paths and the applied signal values.

[0005] The interconnectivity of the neurons in a neural network gives rise to behavior substantially more complex than that of individual neurons. This complex behavior is determined by which neurons have signal transmission paths connecting them, and the respective values of the signal transmission path weights. Desired network behavior can be obtained by the appropriate selection of network topology and weight values. The process of selecting weight values to obtain a particular network characteristic is called training. Different neural network architectures and techniques for training them are described in Parallel Distributed Processing, Vol. 1, D. E. Rumelhart, J. L. McClelland and P. R. Group, Editors, MIT Press, 1986.

[0006] Properly trained neural networks exhibit interesting and useful properties, such as pattern recognition functions. A neural network having the correct architecture and properly trained will possess the ability to generalize. For example, if an input signal is corrupted by noise, the application of the noisy input signal to a neural network trained to recognize the input signal will cause it to generate the appropriate output signal. Similarly, if the set of training signals has shared properties, the application of an input signal not belonging to the training set, but having the shared properties, will cause the network to generate the appropriate output signal. This ability to generalize has been a factor in the interest and tremendous activity in neural network research that is now going on.

[0007] Trained neural networks having an inappropriate architecture for a particular problem do not always correctly generalize after being trained. They can exhibit an “over training” condition in which the input signals used for training will cause the network to generate the appropriate output signals, but an input signal not used for training, and having a shared property with the training set, will not cause the appropriate output signal to be generated. The emergent property of generalization is lost by over training.

[0008] It is an object of the invention to optimize the architecture of a neural network so that over training will not occur, and yet have a network architecture such that the trained network will exhibit the desired emergent property.

SUMMARY OF THE INVENTION

[0009] According to the invention a neural network is defined, and its architecture is represented by a symbol string. A set of input-output pairs for the network is provided, and the input-output pairs are divided into a training set and an evaluation set. The initially defined network is trained with the training set, and then evaluated with the evaluation set. The best performing networks are selected.

[0010] The symbol strings representing the selected network architectures are modified according to a genetic algorithm to generate new symbol strings representing new neural network architectures. These new neural network architectures are then trained by the training set, evaluated by the evaluation set, and the best performing networks are again selected. Symbol strings representative of improved networks are again modified according to the genetic algorithm and the process is continued until a sufficiently optimized network architecture is realized.

BRIEF DESCRIPTION OF THE DRAWING

[0011] The method and network architecture according to the invention is more fully described below in conjunction with the accompanying drawing in which:

[0012]FIG. 1 illustrates the architecture of one kind of neural network having one hidden layer of neurons;

[0013]FIG. 2 illustrates the operation of a genetic algorithm;

[0014] FIGS. 3A-3C show how a neural network architecture is represented by a symbol string and how a genetic algorithm recombination operator changes the symbol string;

[0015]FIG. 4 illustrates the sequence of steps of the method according to the invention; and

[0016]FIG. 5 illustrates an optimized neural network architecture realized according to the invention.

DETAILED DESCRIPTION OF THE INVENTION

[0017]FIG. 1 illustrates a neural network of the feed forward type. The network is comprised of a layer 10 of input neurons 11-14, a hidden layer 20 of neurons 21-24 and a layer 30 of output neurons 31-34. In the network shown the input neurons are connected to the intermediate neurons by signal transmission paths 41, 42 . . . , and the intermediate neurons are connected to the output neuron by signal transmission paths 51, 52, . . . A more general form of feedforward network could also include signal transmission paths from the input neurons directly to the output neurons. For clarity, some of the possible signal transmission paths have been omitted from the drawing.

[0018] Initially, the neuron thresholds and the weights of the signal transmission paths are set to some random values. After training, the neuron thresholds and the path weights will have values such that the network will exhibit the performance for which it was trained.

[0019] The underlying theory of genetic algorithms is described in J. H. Holland, Adaptation in Natural and Artificial Systems, Univ. of Michigan Press, 1975. In carrying out the present invention, a symbol string is used to represent a neural network architecture. A genetic algorithm operates on the symbol strings, and by changing them it changes the architecture of the networks which they represent. The most frequently used symbols are binary, i.e. 0 and 1, but the algorithm is not restricted to binary symbols.

[0020]FIG. 2 shows the steps in the genetic algorithm. An initial population P(i=0) of symbol strings representing an initial population of neural networks is defined. The population is evaluated according to some criterion and then the fittest members, i.e. those closest to the evaluation criterion, are selected. If the fitness of the selected members meet the criterion the algorithm stops. Otherwise, genetic recombination operators are applied to the symbol strings representing the population to create a new population P(i i+1). The steps are repeated until the selection criterion is met and the algorithm halts.

[0021] FIGS. 3A-3C illustrate how a neural network architecture is represented by a symbol string, and how a recombination operator of the genetic algorithm operates on the strings to change the population of neural networks represented by the strings. The field of genetic algorithms has adopted biological terminology, and that terminology will be used in the following discussion.

[0022] In FIG. 3A various neural network parameters are mapped into a binary sequence. The entire binary sequence is referred to as a chromosome, and the particular substrings of the chromosome into which the network parameters are mapped are referred to as genes. The network parameters represented by the genes, and the mapping or representation used for each gene are discussed below in connection with the example given. For now, it is sufficient to understand that the chromosome represents a network having an architecture with the parameters (and the parameter values) represented by the genes. Different chromosomes represent different network architectures.

[0023]FIG. 3B illustrates a pair of chromosomes to which the genetic recombination operator will be applied. These two chromosomes are referred to as parents, and each represents a different network architecture.

[0024] An arbitrary position along the two chromosome strings called the crossover point is selected. The parent chromosomes are severed at the crossover point, and the resulting substrings are recombined in the following manner. The first substring of the first parent chromosome and the second substring of the second parent chromosome are combined to form a new chromosome, called an offspring. Likewise the first substring of the second parent chromosome and the second substring of the first parent chromosome are combined to form another offspring.

[0025] The offspring chromosomes are shown in FIG. 3C. The position where the substrings of the severed parent chromosomes were joined to form the offspring, i.e. the crossover point, is marked with a colon. The colon is not part of the chromosome string but is merely a marker to show where the parents were severed and the resulting substrings recombined to form the offspring. It will be appreciated that the repeated application of this recombination operator to a pair of chromosomes and their offspring will generate a tremendous number of different chromosomes and corresponding neural network architectures. Additionally, the genetic algorithm includes a mutation operator which with cause random bit changes in the chromosomes.

[0026] The method according to the invention can be understood with reference to FIG. 4. An initial population of networks is trained by a training set of input-output pairs, and the trained network is evaluated by an evaluation set of input-output pairs. Networks are selected for reproduction stochastically according to their performance evaluation relative to the population average. The chromosomes representing the network architectures of the selected networks are then modified by the genetic algorithm to create offspring representing new network architectures, and networks having the new architectures. The networks having the new architecture are then trained and evaluated. The best performing networks are again modified according to the genetic algorithm, and the process is continued until some specified performance criteria is met.

EXAMPLE

[0027] An example of the invention was carried out by digital computer simulation. The task to be learned by the neural network was a pattern discrimination learning task called the minimum interesting coding problem. The input to the neural network was four binary signals in which the first two are noise having no relation to the output pattern. The next two represented a binary power-of-two coded integer and the output signals were to be the Gray coding of the input signal.

[0028] Network training was by the back propagation method. For a complete description of back propagation see Parallel Distributed Processing, Vo. 1, supra. pgs. 318-362. The network architecture was represented by a sixteen bit binary string, as shown in FIG. 3A. The first two bits represented the back propagation learning rate (η); the next two the back propagation momentum (α); and the next two, the range of initial path weights (W). Two sets of five bits followed, each for representing a hidden layer. Flag bit F1 indicates whether the first hidden layer is present, and the remaining four bits N1 represented the number of neurons in that layer. Similarly, flag bit F2 indicates whether the second hidden layer is present, and the next four bits represent the number of neurons N2 in the second hidden layer. Thus, the representation could produce nets as large as having two hidden layers of sixteen neurons each and as small as no hidden layers. The particular representations used for the network parameters are as follows.

[0029] Learning rate η=1/2^(n), where n=1+gray code value of the chromosome=1, 2, 3, 4. Thus, n=(0.5, 0.25, 0.125, 0.0625).

[0030] Momentum α=(1−(n/10)), where n=1+gray code value of chromosomes=1, 2, 3, 4. Thus α=(0.9, 0.8, 0.7, 0.6).

[0031] Weight W=2/2^(n), where n=1+gray code value of the chromosome=1, 2, 3, 4. Thus, W=(1.0, 0.5, 0.25, 0.125). Alternatively W=2/2^(n)−constant.

[0032] The activation function of the neurons was a simple sigmoid function and is not critical. The number of input nodes was fixed at four and the number of output nodes was fixed at two, in order to match the problem.

[0033] The input-output pairs of the network are shown in the following Table I. TABLE I The Minimum Interesting Coding Problem input output 0000 00 1100 00 1001 01 1101 01 0010 11 0110 11 0011 10 1011 10 0100 00 1000 00 0001 01 0101 01 1010 11 1110 11 0111 10 1111 10

[0034] The method was applied using only the first eight entries from Table I. One of the first eight entries of the table was chosen at random and reserved for the evaluation set. The remaining seven table entries were used for the training set, and the network was trained until either the sum of the squared error decreased to the preset value 0.10 or after a prespecified number (in the case of this example, 2,000) of exposures to a training pair and back propagation. Once a network was trained, the evaluation set of one input-output pair was applied to it and the mean square error was used as an estimate of its ability to generalize. The sets with the lowest mean square error were selected. The strings representing the selected network architectures were modified according to the genetic algorithm and the process was repeated.

[0035] After producing and testing approximately 1,000 individual network architectures, the genetic algorithm's population had converged on several network properties. All of the final population of individual network architectures had two hidden layers, and the majority of them (19 of 30) had only a single neuron in the first intermediate layer. The most prevalent single architecture is shown in FIG. 4. The path weights and neuron thresholds for one example are shown in the drawing.

[0036] Initially, one would imagine that this architecture could not possibly solve the problem. There are four distinct classes of input patterns, and the network channels all input information into a single neuron. However, the activation level or threshold of the bottleneck neuron discriminates the four classes of input, and the low weights on the two noise input signals show that the network learned to ignore them. The connections above the bottleneck perform a recoding of the input.

[0037] Finally, the best architectures produced were repeatedly trained on all of the first eight table entries and then tested using the eight table entries they had never seen before. This was repeated fifty times and the total sum square error on the test set was determined, together with the number of times at least one of the eight test cases was incorrectly classified. For comparison, this procedure was also performed on the full network architecture with two hidden layers of sixteen neurons each, trained using back propagation learning only. These results are shown in the following Table II. TABLE II criterion full network evolved network total error (mean) 0.675 0.207 total error (standard error) 0.056 0.034 error-free tests 19/50 48/50

[0038] A t-test of the mean difference on total error is significant (α<0.001). Clearly, the full network architecture exhibits over specificity to the training set, i.e. overtraining. It generalizes poorly. On the other hand, the severe restriction of the architecture determined through the application of the genetic algorithm exhibits a substantially better performance and generalizes much better.

[0039] The preferred embodiment disclosed is a feedforward neural network. It is contemplated that the invention covers other types of optimized networks such as networks having feedback, networks without hidden neurons and other network configurations. Additionally, the genetic algorithm could use a more elaborate recombination operator such as one with more than one crossover point. Accordingly, the particular example given above should not be construed as limiting the scope of the invention which is defined by the following claims. 

What is claimed is:
 1. A method of optimizing the structure of an artificial neural network, comprising: defining a neural network having an initial architecture comprised of a plurality of neurons; defining a symbol string representing the architecture of the neural network; providing a set of neural network inputs including a set for training and a set for evaluation; training the neural network using the training set of inputs; evaluating the trained neural network using the evaluation set of inputs; modifying the symbol string representation of the neural network architecture according to a genetic algorithm; successively training and evaluating the neural networks represented by the modified symbol strings and modifying the symbol strings representing improved neural networks.
 2. A method according to claim 1, further comprising the step of selecting the fittest of the evaluated trained network according to a defined criterion; and carrying out the step of modifying according to a genetic algorithm on the selected fittest networks.
 3. A method according to claim 2, wherein the step of training the neural network is carried out by supervised learning.
 4. A method according to claim 1, wherein the step of training the neural network is carried out by supervised learning.
 5. A method according to claim 2, wherein the symbol string representation of the neural network architecture represents the number of layers of hidden neurons, and the number of neurons with each hidden layer of the network.
 6. A method according to claim 5, wherein the network is trained by back propagation, and the symbol string representation of the neural network architecture further represents the back propagation parameters of learning rate, momentum and dispersion of initial link weights.
 7. An artificial neural network optimized according to claim
 1. 8. An artificial neural network optimized according to claim
 2. 9. An artificial neural network optimized according to claim
 3. 10. An artificial neural network optimized according to claim
 4. 11. An artificial neural network optimized according to claim
 5. 12. An artificial neural network optimized according to claim
 6. 13. A method of optimizing the structure of an artificial neural network, comprising: defining a neural network having an initial architecture comprised of a plurality of input neurons, output neurons and hidden neurons, and a plurality of signal transmission paths for applying output signals from said input neurons to said hidden neurons and for applying output signals from said hidden neurons to said output neurons; defining a symbol string representing the architecture of the neural network; providing a set of neural net input-output pairs including a set for training and a set for evaluation; training the neural network by supervised learning using the training set of input-output pairs; evaluating the trained neural network using the evaluation set of input-output pairs; modifying the symbol string representation of the neural network architecture according to a generic algorithm; successively training and evaluating the neural networks represented by the modified symbol strings and modifying the symbol strings representing improved neural networks.
 14. A method according to claim 13, wherein the step of training by supervised learning is carried out by a back propagation algorithm.
 15. A method according to claim 14, wherein the symbol string representation of the neural network architecture represents the number of intermediate neurons.
 16. A method according to claim 15, wherein the symbol string representation of the neural network architecture further represents the back propagation parameters of learning rate, momentum and dispersion of initial link weights.
 17. An artificial neural network optimized according to claim
 13. 18. An artificial neural network optimized according to claim
 14. 19. An artificial neural network optimized according to claim
 15. 20. An artificial neural network optimized according to claim
 16. 21. An optimized artificial neural network, comprising: a plurality of input neurons for receiving network input signals and for developing output signals in response thereto; a plurality of output neurons receptive of signals for developing network output signals; a plurality of hidden neurons which receive input signals and develop output signals; signal transmission means comprised of a plurality of signal paths for applying output signals from said input neurons to said hidden neurons and for applying output signals from said hidden neurons to said output neurons; and the number of hidden neurons, the neuron threshold functions and the signal path weights having values optimized by supervised learning and network modification by a genetic algorithm.
 22. A neural network according to claim 21, wherein one hidden neuron layer has only one neuron. 